One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
translated by 谷歌翻译
尽管已经提出了许多方法来增强对抗性扰动的可转移性,但这些方法是以启发式方式设计的,并且尚不清楚改善对抗性转移性的基本机制。本文总结了在统一视图中以十二个以前的可传递性提高方法共享的共同机制,即这些方法都减少了区域对抗性扰动之间的游戏理论相互作用。为此,我们专注于区域对抗扰动之间所有相互作用的攻击效用,我们首先发现并证明了对抗传递性与相互作用的攻击效用之间的负相关性。基于这一发现,我们从理论上证明并从经验上验证了十二种以前的可传递性提高方法均减少了区域对抗扰动之间的相互作用。更重要的是,我们将相互作用的减少视为增强对抗性转移性的基本原因。此外,我们设计了交互损失,以直接惩罚攻击过程中区域对抗扰动之间的相互作用。实验结果表明,相互作用损失显着提高了对抗扰动的转移性。
translated by 谷歌翻译
世界模型学习基于视觉的交互式系统中动作的后果。但是,在诸如自动驾驶之类的实际情况下,通常存在独立于动作信号的不可控制的动态,因此很难学习有效的世界模型。为了解决这个问题,我们提出了一种新颖的增强学习方法,名为Iso-Dream,该方法在两个方面改善了梦境到控制框架。首先,通过优化逆动力学,我们鼓励世界模型学习隔离状态过渡分支的时空变化的可控和不可控制的来源。其次,我们优化了代理在世界模型的潜在想象中的行为。具体而言,为了估算状态值,我们将不可控制状态推出到未来,并将其与当前可控状态相关联。这样,动态来源的隔离可以极大地使代理商的长期决策受益,例如一种自动驾驶汽车,可以通过预测其他车辆的移动来避免潜在的风险。实验表明,ISO-Dream可以有效地解耦混合动力学,并且在广泛的视觉控制和预测域中明显优于现有方法。
translated by 谷歌翻译
作为一种强大的建模方法,分段线性神经网络(PWLNNS)已在各个领域都被证明是成功的,最近在深度学习中。为了应用PWLNN方法,长期以来一直研究了表示和学习。 1977年,规范表示率先通过增量设计学到了浅层PWLNN的作品,但禁止使用大规模数据的应用。 2010年,纠正的线性单元(RELU)提倡在深度学习中PWLNN的患病率。从那以后,PWLNNS已成功地应用于广泛的任务并实现了有利的表现。在本引物中,我们通过将作品分组为浅网络和深层网络来系统地介绍PWLNNS的方法。首先,不同的PWLNN表示模型是由详细示例构建的。使用PWLNNS,提出了学习数据的学习算法的演变,并且基本理论分析遵循深入的理解。然后,将代表性应用与讨论和前景一起引入。
translated by 谷歌翻译
理论上,从理论上分析$ \ ell_ {1} $的典型学习性能 - 正规化的线性回归($ \ ell_1 $ -linr),用于使用统计力学中的副本方法进行模型选择。对于顺磁阶段的典型随机常规图,获得了对$ \ ell_1 $ -LinR的典型样本复杂度的准确估计。值得注意的是,尽管模型拼写错误,$ \ ell_1 $ -linr是模型选择,其与$ \ ell_ {1} $ - 正常化的逻辑回归($ \ ell_1 $ -logr),即,$ m = \ mathcal {o} \ left(\ log n \ light)$,其中$ n $是ising模型的变量数。此外,我们提供了一种有效的方法,可以准确地预测$ \ ell_1 $ -Linr的非渐近行为,以便适度$ M,N $,如精度和召回。仿真在理论预测和实验结果之间表现出相当愉快的一致性,即使对于具有许多环路的图表,也支持我们的研究结果。虽然本文主要侧重于$ \ ell_1 $ -Linr,但我们的方法很容易适用于精确地表征广泛类别的$ \ ell_ {1} $的典型学习表演 - 正常化$ M $-estimators,包括$ \ ell_1 $ - LogR和互动筛查。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译